DNorm: disease name normalization with pairwise learning to rank
نویسندگان
چکیده
MOTIVATION Despite the central role of diseases in biomedical research, there have been much fewer attempts to automatically determine which diseases are mentioned in a text-the task of disease name normalization (DNorm)-compared with other normalization tasks in biomedical text mining research. METHODS In this article we introduce the first machine learning approach for DNorm, using the NCBI disease corpus and the MEDIC vocabulary, which combines MeSH® and OMIM. Our method is a high-performing and mathematically principled framework for learning similarities between mentions and concept names directly from training data. The technique is based on pairwise learning to rank, which has not previously been applied to the normalization task but has proven successful in large optimization problems for information retrieval. RESULTS We compare our method with several techniques based on lexical normalization and matching, MetaMap and Lucene. Our algorithm achieves 0.782 micro-averaged F-measure and 0.809 macro-averaged F-measure, an increase over the highest performing baseline method of 0.121 and 0.098, respectively. AVAILABILITY The source code for DNorm is available at http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/DNorm, along with a web-based demonstration and links to the NCBI disease corpus. Results on PubMed abstracts are available in PubTator: http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/PubTator .
منابع مشابه
NCBI at 2013 ShARe/CLEF eHealth Shared Task: Disorder Normalization in Clinical Notes with Dnorm
We describe an application of DNorm – a mathematically principled and high performing methodology for disease recognition and normalization, even in the presence of term variation – to clinical notes. DNorm consists of a text processing pipeline, including the BANNER named entity recognizer to locate diseases in the text, and a novel machine learning approach based on pairwise learning to rank ...
متن کاملAutomated Disease Normalization with Low Rank Approximations
While machine learning methods for named entity recognition (mention-level detection) have become common, machine learning methods have rarely been applied to normalization (concept-level identification). Recent research introduced a machine learning method for normalization based on pairwise learning to rank. This method, DNorm, uses a linear model to score the similarity between mentions and ...
متن کاملChallenges in clinical natural language processing for automated disorder normalization
BACKGROUND Identifying key variables such as disorders within the clinical narratives in electronic health records has wide-ranging applications within clinical practice and biomedical research. Previous research has demonstrated reduced performance of disorder named entity recognition (NER) and normalization (or grounding) in clinical narratives than in biomedical publications. In this work, w...
متن کاملPassive and Active Ranking from Pairwise Comparisons
In the problem of ranking from pairwise comparisons, the learner has access to pairwise preferences among n objects and is expected to output a total order of these objects. This problem has a wide range of applications not only in computer science but also in other areas such as social science and economics. In this report, we will give a survey of passive and active learning algorithms for ra...
متن کاملActive Learning to Rank using Pairwise Supervision
This paper investigates learning a ranking function using pairwise constraints in the context of human-machine interaction. As the performance of a learnt ranking model is predominantly determined by the quality and quantity of training data, in this work we explore an active learning to rank approach. Furthermore, since humans may not be able to confidently provide an order for a pair of simil...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 29 شماره
صفحات -
تاریخ انتشار 2013